Parallel buses had three major timing issues that prevented them from going much faster:

Problem Why It Limits Speed PCIe Solution
Flight Time (signal propagation delay) Data must arrive within one clock period. At high clock speeds, there’s not enough margin. PCIe sends the clock inside the data stream (embedded clock). The receiver recovers the clock from data, so flight time no longer matters.
Clock Skew (different clock arrival times at transmitter & receiver) Reduces timing budget and risks sampling errors. Eliminated — the recovered clock is aligned with data.
Signal Skew (different data lines arrive at slightly different times) Must wait for slowest signal before latching, limiting clock speed. Gone — PCIe sends one bit per lane, so no intra-lane skew. (If multiple lanes are used, receiver performs lane deskew automatically.)
This is why PCIe can scale to 2.5 GT/s, 5.0 GT/s, 8 GT/s, and beyond — something impractical for PCI/PCI-X.

Bandwidth Math

PCIe combines:

So effective bandwidth = bit rate × (payload efficiency) × lane count × 2 (duplex)

Gen2 doubles this (5.0 GT/s → 0.5 GB/s per direction per lane).

Gen3 uses 128b/130b encoding (~98.5% efficiency) + 8 GT/s to nearly double bandwidth again.